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■*^ ■ Abstract 

o 

p^ ■ Various distributed optimization methods have been developed for solving problems which have 

simple local constraint sets and whose objective function is the sum of local cost functions of distributed 
►^ ' agents in a network. Motivated by emerging applications in smart grid and distributed sparse regression, 

ry^ ■ this paper studies distributed optimization methods for solving general problems which have a coupled 

global cost function and have inequality constraints. We consider a network scenario where each agent 
O ' has no global knowledge and can access only its local mapping and constraint functions. To solve this 

problem in a distributed manner, we propose a consensus-based distributed primal-dual perturbation 
(PDP) algorithm. In the algorithm, agents employ the average consensus technique to estimate the 
^ ' global cost and constraint functions via exchanging messages with neighbors, and meanwhile use a 

^j. . local primal-dual perturbed subgradient method to approach a global optimum. The proposed PDP 

l/~j I method not only can handle smooth inequality constraints but also non-smooth constraints such as 

"*^ ' some sparsity promoting constraints arising in sparse optimization. We prove that the proposed PDP 

■^ . algorithm converges to an optimal primal-dual solution of the original problem, under standard problem 

and network assumptions. Numerical examples illustrating the performance of the proposed algorithm 
for a sparse regression problem and a demand response control problem in smart grid are also presented. 



Index terms— Distributed optimization, constrained optimization, average consensus, primal-dual sub- 
gradient method, regression, smart grid, demand response control 

I. Introduction 

Distributed optimization methods are becoming popular options for solving several engineering 
problems, including parameter estimation, detection and localization problems in sensor networks 
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[1], [2], resource allocation problems in peer-to-peer/multi-cellular communication networks [3], 
[4], and distributed learning and regression problems in control [5] and machine learning [6]-[8], 
to name a few. In these applications, rather than pooling together all the relevant parameters that 
define the optimization problem, distributed agents, which have access to a local subset of such 
parameters, collaborate with each other to minimize a global cost function, subject to local vari- 
able constraints. Specifically, since it is not always efficient for the agents to exchange across the 
network the local cost and constraint functions, owing to the large size of network, time-varying 
network topology, energy constraints and/or privacy issues, distributed optimization methods that 
utilize only local information and messages exchanged between connecting neighbors have been 
of great interest; see [9]-[16] and references therein. 

Contributions: Different from the existing works [9]-[14] where the local variable constraints 
are usually simple (in the sense that they can be handled via simple projection) and independent 
among agents, in this paper, we consider a problem formulation that has a general set of convex 
inequality constraints that couple all the agents' optimization variables. In addition, similar 
to [17], the considered problem has a global (non-separable) convex cost function that is a 
function of the sum of local mapping functions of the local optimization variabless. Such a 
problem formulation appears, for example, in the classical regression problems which have a 
wide range of applications. In addition, the considered formulation also arises in the demand 
response control and power flow control problems in the emerging smart grid systems [18]-[20]. 
More discussions about applications are presented in Section II-B. 

In this paper, we assume that each agent knows only the local mapping function and local 
constraint function. To solve this problem in a distributed fashion, in this paper, we develop a 
novel distributed consensus-based primal-dual perturbation (PDP) algorithm, which combines 
the ideas of the primal-dual perturbed (sub-)gradient method [21], [22] and the average consensus 
techniques [10], [23], [24]. In each iteration of the proposed algorithm, agents exchange their 
local estimates of the global cost and constraint functions with their neighbors, followed by 
performing one-step of primal-dual variable (sub-)gradient update. Instead of using the primal- 
dual iterates computed at the preceding iteration as in most of the existing primal-dual subgradient 
based methods [15], [16], the (sub-)gradients in the proposed distributed PDP algorithm are 
computed based on some perturbation points which can be efficiently computed using the 
messages exchanged from neighbors. In particular, we provide two efficient ways to compute the 
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perturbation points that can respectively handle the smooth and non- smooth constraint functions. 
More importantly, we build convergence analysis results showing that the proposed distributed 
PDP algorithm ensures a strong convergence of the local primal-dual iterates to a global optimal 
primal-dual solution of the considered problem. The proposed algorithm is applied to a distributed 
sparse regression problem and a distributed demand response control problem in smart grid. 
Numerical results for the two applications are presented to demonstrate the effectiveness of the 
proposed algorithm. 

Related works: Distributed dual subgradient method (e.g., dual decomposition) [25] is a 
popular approach to solving a problem with coupled inequality constraints in a distributed 
manner. However, given the dual variables, this method requires the agents to globally solve 
the local subproblems, which may require considerable computational efforts if the local cost 
and constraint functions have some complex structure. Consensus-based distributed primal-dual 
(PD) subgradient methods have been developed recently in [15], [16] for solving a problem 
with an objective function which is the sum of local convex cost functions, and with global 
convex inequality constraints. In addition to having a different cost function from our problem 
formulation, the works in [15], [16] assumed that all the agents in the network have global 
knowledge of the inequality constraint function; the two are in sharp contrast to our formulation 
where a non-separable objective function is considered and each agent can access only its local 
constraint function. Moreover, these works adopted the conventional PD subgradient updates 
[26], [27] without perturbation. Numerical results will show that these methods do not perform 
as well as the proposed algorithm with perturbation. Another recent development is the Bregman- 
distance based PD subgradient method proposed in [28] for solving an epigraph formulation of 
a min-max problem. The method in [28], however, assumes that the Lagrangian function has 
a unique saddle point, in order to guarantee the convergence of the primal-dual iterates. In 
contrast, our proposed algorithm, which uses the perturbed subgradients, does not require such 
assumption. 

Synopsis: Section II presents the problem formulation, applications, and a brief review of the 
centralized PD subgradient methods. Section III presents the proposed distributed consensus- 
based PDP algorithm. The assumptions and convergence analysis results are given in Section 
IV. Numerical results are presented in Section V. Finally, the conclusions and discussion of 
future extensions are drawn in Section VI. 
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II. Problem Formulation, Applications and Brief Review 

A. Problem Formulation 

We consider a network with A^ agents, denoted by V = {1, . . . ,N}. We assume that, for 
all z = 1, . . . , A^, each agent i has a local decision variable' xi E R^, a local constraint set 
Xi C R^, and a local mapping function fi : R^ — > R*^, in which fi = {fn, . . . , fiu) with 
each fim : R^^ — )■ R being continuous. The network cost function is given by 

^{xi,...,XN) = J^y^fi{xi)y (1) 

where J^ : R*^ — )■ R and F : R^^ — )• R are continuous. In addition, the agents are subject to a 
global inequality constraint ^j^^ gi{xi) :< 0, where Qj : R^ —^ R^ are continuous mappings for 
all z = 1, . . . , A^; specifically, Qi = {gn, . . . , gip) , with each gip : R^ — )■ R being continuous. 
The vector inequality J2i=i9ii^i) ^ is understood coordinate-wise. 

We assume that each agent i can access -F(-), fi{-), gi{-) and Xi only, for alH = 1, . . . , A^. 
Under this local knowledge constraint, the agents seek to cooperate with each other to mini- 
mize the total network cost F{xi, . . . , xj^) (or maximize the network utility —F{xi, . . . , x^))- 
Mathematically, the optimization problem can be formulated as follows 

N 

min ^{xi,...,Xm) %X. y^gi{xi) <0. (2) 

Xi£Xi, ^ — ' 

i=l,...,N «=1 

The goal of this paper is to develop a distributed algorithm for solving (2) with each agent 
communicating with their neighbors only. 

B. Applications 

In this subsection, we discuss some applications where the problem formulation (2) may arise. 

Smart grid demand response and power flow control: Consider a power grid system where 
a retailer (e.g., the utility company) bids electricity from the power market and serves a resi- 
dential/industrial neighborhood with A^ customers. In addition to paying for its market bid, the 
retailer has to pay additional cost if there is a deviation between the bid purchased in earlier 
market settlements and the real-time aggregate load of the customers. Any demand excess or 

'Here, without loss of generality, we assume that all the agents have the same variable dimension K. The proposed algorithm 
and analysis can be easily generalized to the case with different variable dimensions. 
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shortfall results in a cost for the retailer that mirrors the effort to maintain the power balance. 
In the smart grid, thanks to the advances in communication and sensory technologies, it is 
envisioned that the retailer can observe the load of customers and can even control the power 
usage of some of the appliances (e.g., controlling the charging rate of electrical vehicles and 
tuming ON/OFF air conditioning systems), which is known as the demand response (DR) control 
problem; see [29] for a recent review. 

We let Pi, t = 1, . . . , T, be the power bids over a time horizon of length T, and let tpi^tixi), 
t = 1, . . . , T, be the load profile of customer i, where Xi E M^ contains some control variables. 
The structures of ^j ^ and xi depend on the appliance load model. As mentioned, the retailer 
aims to minimize the cost caused by power imbalance, e.g., [18], [19], [29] 



min Cp 

x\€X\,...,xk&^k 



i^i^,{Xi)-p\ +Cs (p-^ll)i{Xi)\ 

\ i=i /J L V j=i / 



(3) 



where (x)+ = max{x, 0}, A'j denotes the local control set and Cp, Cg : M.^ — > M denote the 
cost functions due to insufficient and excessive power bids, respectively. Moreover, let p = 
(Pi, ... , Pt)'^ and V* = (V^i,i, • • • , i^i,TV- By defining z = {^f^^ il^i{xi) - p)+ and assuming 
that Cp is monotonically increasing, one can write (3) as 



min Cp[z] + Cs 



N 



N 

(4) 



^'ll)i{Xi)+p 



s.t. ^ilJi{xi)-p- z^O, 



which belongs to the considered formulation in (2). Similar problem formulations also arise in 
the microgrid control problems [20], [30] where the microgrid controller requires not only to 
control the loads but also to control the local power generation and local power storage (i.e., 
power flow control), in order to maintain power balancing within the microgrid; see [30] for 
detailed formulations. 

Distributed control methods are appealing to the smart grid application since all the agents 
within the systems are identical and failure of one agent would not have significant impact 
on the performance of the whole system [31]. Besides, it also spares the retailer/microgrid 
controller from the task of collecting real-time load information from the customers, which not 
only infringes on the customer's privacy but also is not easy for a large scale neighborhood. In 
Section V, the proposed distributed algorithm will be applied to a DR control problem as in (4). 
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Distributed regression: Regression involves modeling a response signal (e.g., observation 
or output of an unknown system) as a function of some regression parameters, which has wide 
applications, including control [5], machine learning [6], [7], data mining [32], [33] and imaging 
processing [7]. The goal in regression is to find the regression parameters so that the predictor 
function output can best represent the response signal. Let us consider a multi-agent scenario 
where each of the agents owns a local predictor function. Let r E M^^ be a response signal 
that is known to all agents, and let 4>i{xi) be the local predictor function at agent i, where 
<pi : M^^ — 7> M^^ and Xi is the regression parameter. In some applications such as distributed data 
mining between heterogeneous sites [32], [33] (i.e., so called vertically partitioned data [17], 
[34], [35]) , the agents have to generate a global data model by combining the local analysis 
results. In such case, the regression problem is to minimize 

c(r-Y,Mx^)) (5) 

where C : M^^ — )■ R stands for some loss function. The regression parameters across the network 
may have to satisfy certain constraint. For example, it is desirable that the values in (xi, . . . , xn) 
are sparse, which will facilitate the storage of these local analysis results [36]. Sparsity promoting 
constraints, such as the one-norm constraint 

N 

^Wi\\xi\\i < ko, (6) 

i=l 

can be imposed for such purpose, were Wi > are some weights and ko specifies the sparsity 
level. Note that recent works in control [37], [38] considered sparsity of some control signals 
and thus involve dealing with the sparsity promoting functions also. The problem formulation in 
(5) and (6) thus falls within the category of formulation (2). In Section V, we will also examine 
the proposed distributed algorithm by considering a sparse regression problem as in (5) and (6). 

In addition to the above two applications, formulation (2) also encompasses the network flow 
control problems [39] where flow control is usually subject to capacity and flow conservation 
constraints; see [40] for an example which considered maximizing the network lifetime. 
C. Review of Centralized PD Subgradient Method 

Let us consider the following Lagrange dual problem of (2): 

max < niin Cix.X) >, (7) 

Abo xex ' 
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where x = (xj, . . . , x'^y , X = Xi yi ■ ■ ■ yi X^, A G M;^ (M^ is the non-negative orthant 
in W) is the dual variable associated with the inequality constraint X]i=ifl'i(^j) — 0' ^^'^ 
C : M^^ X WJ! — )■ M is the Lagrangian function, given by 



N 



C{x, A) = ^(a;i, . . . , a^^v) + A^ I J]] g,(a3,) J . (8) 

We assume that strong duality holds for problem (2) [41]: 

Assumption 1 Problem (2) is a convex problem and Slater's condition holds, i.e., there is an 

{xi, . . . , xn) that lies in the relative interior of A*! x ■ ■ ■ x X^ such that Yl,i=i dii^d ~< 0- 

Under such condition, one is able to handle (2) by solving its dual in (7). A classical approach 
along this line is the dual subgradient method [42]. Specifically, given a dual variable A*^*^^^^ at 
iteration k, one solves the inner minimization problem 

a;^^) =arg min £(a;,A('^-^))), (9) 

followed by updating the dual variable by A'^'^^ = (A^^^^^^ + ak ^i=i9i{Xi )) for the outer 
maximization part in (7), where a^ > is the step size. One limitation of the dual subgradient 
method is that the inner problem (9) needs to be globally solved, which, however, is not always 
easy. Even when ^(x) = J2i=i fii^i) for which (9) can be decomposed into N parallel 
subproblems, attaining the global optimum for each subproblem may still require considerable 
computational efforts if fi(xi), gi(xi) and the local set Xi have complex structures. One should 
note that the dual decomposition method [25] is exactly based on the dual subgradient method. 
Another approach to dealing with (7) is the (centralized) primal-dual (PD) subgradient method 
[26], [43] which replaces (9) by a simple primal subgradient update. More precisely, the PD 
subgradient method can be described as follows. At iteration k, perform 



X 



(k) _ -p, /^(fc-l) 



VA^^'-'^ - ak CUx^'-'\ X^'-'^)), (10a) 



^{k) ^ ^^{k-i) ^ ^^ Cx{x^^-^\ A('=-i)))+, (10b) 

where Vx ■ M^^ — )• A" is a projection function, a^ > is the step size, and 
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' C^,{x^'\X^'^)' 




.{x^'W^'^)^ 


_/:.,(a;W,AW)_ 




N 


,(x('^),A('^))4 


> >.(-f^), 





v/i;(a:(j))v-^(Ef=i /.(^f )) + v^i;(a:(jV('=), 

(11a) 

_ (lib) 

represent the subgradients of C at (x^'^\ X^'^^) with respect to cc and A, respectively. Each 
Vgi{xl ') is a. P X K matrix with rows equal to the subgradients Vgjp{xi), p = 1,...,P 
(gradients if they are continuously differentiable), and each V fi{x\ ) is a M x A' matrix with 
rows containing the gradients Wf^i^Xi), m = 1, . . . , M. 

The idea behind the PD subgradient method lies in the following well-known saddle-point 
relation, provided that the strong duality holds: 

Theorem 1 (Saddle-Point Theorem) [41] The point {x*, A*) E X x M^ is a primal-dual solution 
pair of problems (2) and (7) if and only if there holds: 

C{x\ A) < C{x\ X*) < C{x, X*) yxeX, XhO. (12) 

According to Theorem 1, if the PD subgradient method converges to a saddle point of the 
Lagrangian function (8), then it solves the original problem (2). Convergence properties of the 
PD method in (10) have been studied extensively; see, for example, [26], [27], [43]. In such 
methods, typically a subsequence of the sequence {x^''\ X^'^^) converges to a saddle point of the 
Lagrangian function in (8). To ensure the convergence of the whole sequence {x^''\ X^'^^), it is 
often assumed that the Lagrangian function is strictly convex in x and strictly concave in A, 
which does not hold in general however. 

One of the approaches to circumventing this condition is the primal-dual perturbed (PDP) 
subgradient method in [21], [22]. Specifically, [21] suggests to update x^^~^^ and A*^*^^^^ based 
on some perturbation points, denoted by a^'^^ and ^^''\ respectively. The PDP updates are 



X 



VA^^'-'^-akC^{x^'-'\^^'^)), (13a) 



XW = (;^(fc-i) + a^ £;,(««, X'-'-'^)) + . (13b) 

Note that, in (13a), we have replaced A'^'^"^^ by 0^''\ and, in (13b), replaced x^''"^^ by (x^''\ and 
thus Cx{x^^^^\ (3^^^) and C\{6l^^\ A^'^~^^) are perturbed subgradients. It was shown in [21] that, 
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with carefully chosen (ct^''\ 0^''^) and the step size a^, the primal-dual iterates in (13) converge 
to a saddle point of (7), without any strict convexity and concavity assumptions on C 

There are several ways to generate the perturbation points ck*^*^) and 0^''\ Our interests lie 
specifically on those that are computationally as efficient as the PD subgradient updates in (13). 
Depending on the smoothness of {gipjp^-^^, we consider the following two methods: 

Gradient Perturbation Points: A simple approach to computing the perturbation points is 
using the conventional gradient updates exactly as in (10), i.e., 

.W = Vxix^''^'^ - pi /:.(a;(^-i), A('^-i))), (14a) 



a.^ 



0{k) ^ (A(^-i) +P2 £;,(a;(^--i), A(^--i)))+ 



(14b) 



where pi > and p2 > are constants. The PDP subgradient method thus combines (13) 
and (14), which involve two primal and dual subgradient updates. Even though the updates are 
relatively simple, this method requires smooth constraint functions gip, p = 1, . . . , P. 

Proximal Perturbation Points: In cases where g^p, p = 1, . . . , P, are non-smooth, we compute 
the perturbation point ct^''^ by the following proximal gradient update^ [44]: 



N 



ct^^' =ar£rmin 



: ar ff mm 



5^5f(a.)A('=-) + -^ 



N 



2pi 



a. 



X 



(fe-i) 



PiVJ^ix^''^'))) 



V^n^i)^^'^'^ + (« - a;('=-i))'^V^(a;('=-^)) + —\\cx 

2pi 



X 



(fe-l)||2 



(15) 
(16) 



i=l 



where a = (a 



1 ' 



, a^)^ and 



VJ^{x 



(fc-i)^ 



,(fc-i) 



,(fc-i)^ 



vfnxr")vT{j:t,Mxr")) 



vfU^t'^)^nELfN{xt'h) 



(17) 



It is worthwhile to note that, when g^p, p = 1, . . . , P, are some sparsity promoting functions (e.g., 
the 1-norm, 2-norm and the nuclear norm) that often arise in sparse regression problems [7], 
[35], [45], the proximal perturbation point in (15) can be solved very efficiently and may even 
have closed-form solutions. For example, if gi{(y.i) = ||q:j||i for all i (P = 1), and X = M^^, 
(15) has a closed- form solution known as the soft thresholding operator [7]: 



a.' 



(18) 



If not mentioned specifically, the norm function || ■ || stands for the Euclidian norm. 
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where b = x^''^^^ — piVF{x'^^^^'^) and 1 is an all-one vector. To the best of our knowledge, 
the proximal perturbation point (15) is novel, as it has not appeared in earlier works [21], [22]. 

III. Proposed Consensus-Based Distributed PDP Algorithm 

Our goal is to develop a distributed counterpart of the PDP subgradient method in (13). Let 
us recall Assumption 1 and consider the following saddle-point problem 

max < min £(iCi, . . . , jctv, A) > (19) 

K 1=1,. ..,N J 

where 

v=Uyo\ \\x\\ < Da = ^Mni + ^1 (20) 

in which x = (a;f , . . . , xjj)^ is a Slater point of (2), q = mina,^e;f^^j=i^...^Ar jC{xi, . . . , x^, A) is 
the dual function value for some arbitrary A ^ 0, 7 = minp=i p{— X]i=i fi'ip(^«)}' ^i^^ 6 > 
is arbitrary. It has been shown in [46] that, under Assumption 1, the optimal dual solution of 
(7), denoted by A*, satisfies 

IIAI < ^^^1^ (21) 

7 

and thus A* lies in V. Here we consider the saddle point problem (19), instead of the original 
Lagrange dual problem (7), because V bounds the dual variable A and thus also bounds the 
subgradient Cx{x^''\ X^'''^) in (11a). This property is important in building the convergence of 
the distributed algorithm to be discussed shortly. Both (7) and (19) have the same optimal dual 
solution A* and attain the same optimal objective value. One can further verify that any saddle 
point of (7) is also a saddle point of (19). However, to prove the converse, some conditions are 
needed, as given in the following proposition. 

Proposition 1 (Primal-dual optimality conditions) [47] Suppose that Assumption 1 holds. Let 

{xl, . . . , x*j^, A*) be a saddle point of (19). Then {x^, . . . , a;^) is an optimal solution for problem 
(2) if and only if 



N / ^ \ 

Y,9.{xX) < and (V)^ $^9.(^-) = 0. 

j=l \ i=l / 
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Proposition 1 implies that if a saddle point of (19) is primal feasible and satisfies the com- 
plementary slackness condition, then it is also a saddle point of problem (7), i.e., an optimal 
primal-dual solution pair of problem (2). 

(k) 

To have a distributed optimization algorithm for solving (19), in addition to x\ , we let each 
agent i have a local copy of the dual iterate X^^\ denoted by A^^ '. Moreover, each agent i owns 
two auxiliary variables, denoted by y-' and z-', representing respectively the local estimates 
of the average values of the argument function -^ X^j=i fii^l ) and of the inequality constraint 
function -^ X]i=i 9ii^i ), for alH = 1, . . . , A^. We consider a time-varying synchronous network 
model [11], where the network of agents at time k is represented by a weighted directed graph 
Q{k) = {V,S{k), W{k)). Here (i, j) G S{k) if and only if agent i can receive messages from 
agent j, and W{k) E M^^^ is a weight matrix with each entry [VV^(/^)]ij representing a weight 
that agent i assigns to the incoming message on link (i,j) at time k. If (i,j) E £{k), then 
[VF(A;)]jj > and [VF(/i;)]jj = otherwise. The agents exchange messages with their neighbors 
(according to the network graph Q{k)) in order to achieve consensus on A^^^, ^i^iOiixl ) 
and X]i=i fii^i )' while computing local perturbation points and primal-dual (sub-)gradient 
updates locally. Specifically, the proposed distributed consensus-based PDP method consists of 
the following steps at each iteration k: 

(k—l) (k—1) (k—1) 

1) Averaging consensus: For i = 1, . . . ,N, each agent i sends yl , z] and A- to 
all its neighbors j satisfying (j, z) E £{k); it also receives y, ~ , z ' and A^ ~ from its 
neighbors, and combines the received estimates, as follows: 

N N N 

yf^ = J2[W{k)].,yf~'\ zf = J2[W{k)],,zf^'\ Af ) = 5^[W(fc)].,Af-^\ (22) 
j=i j=i j=i 

2) Perturbation point computation: For i = 1, . . . , iV, if functions gip, p = 1, . . . , P, are 

smooth, then each agent i computes the local perturbation points by 

«f^ = ^^.(^r^^ - Pi[Vfrixt'^)VnNyl'^) + V^f (a.f-^))Af )]), (23a) 

/3f ) = P^ ( Af ^ + p2 iV^f '^ ) . (23b) 

Note that, comparing to (14) and (15), agent i here uses the most up-to-date estimates Nyl ', 

Nzf^ and Af ) in place of Eti U^f'^), Eti m{xf-'^) and A(^-i). If g.^, p = 1, . . . , P, 
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Algorithm 1 Distributed Consensus-Based PDP Algorithm 



1: Given initial variables x\ ' G A^, AJ h 0, Vi = fii^i ) and z^ ' = Qii^x] ') for each 

agent i, i = 1,. . . ,N] Set k = 1. 
2: repeat 

3: Averaging consensus: For i = 1,. . . ,N, each agent i computes (22). 
4: Perturbation point computation: For i = 1,. . . ,N, if {gip}^^^ are smooth, then each 

agent i computes the local perturbation points by (23); otherwise, each agent i instead 

computes a- •* by (24). 
5: Local variable updates: For i = 1,. . . ,N, each agent i updates (25), (26), (27) and (28) 

sequentially. 
6: Set A; = A; + 1. 
7: until a predefined stopping criterion (e.g., a maximum iteration number) is satisfied. 



are non-smooth, agent i instead computes ai by 

cSP =arg min (^f (a.)Af^ + ^ll«. " (^f ~'^ - p,W fJ{xf-'^)WHNy^h)f] , (24) 

for z = l,...,Ar. 

3) Primal-dual perturbed subgradient update: For z = 1, . . . , A^, each agent i updates its 
primal and dual variables {x\\\\') based on the local perturbation point (a • % /3- ): 

^f^ = V^^S^-'^ - a,[Vfnxt'')^HNyt) + Wgfixt'W^]), (25) 

Af) = P^(Af) + o.s.(af)). (26) 

4) Auxiliary variable update: For z = 1, . . . , A^, each agent i updates variable yl , z| ' with 
the changes of the local argument function fi{xl ') and the constraint function gi{xi ') : 

2/f = i^f + /.(^f ^) - /.(^f-^^ (27) 

.f) = if)+^.(a.f))-^.(.r)). (28) 



Algorithm 1 summarizes the above steps. We prove that Algorithm 1 converges under proper 
problem and network assumptions in the next section. Readers who are interested more in 
numerical performance of Algorithm 1 may go directly to Section V. 
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IV. Convergence Analysis 

Next, in Section IV-A, we present some additional assumptions on problem (2) and the network 
model. The main convergence results are presented in Section IV-B. 

A. Assumptions 

Assumption 2 (a) The sets Xi, i = 1, . . . , N, are convex and compact. In particular, for i = 
1, . . . ,N, there is a constant D^ > such that 

\\xi\\ < D, ^Xi e Xf, (29) 

(b) The functions fn, . . . , fiM, i = 1, . . . , N, are continuously differentiable; 

(c) The constraint functions gn . . . , gip, i = 1, . . . , N, are convex (possibly non-smooth). 

Note that Assumption 2(a) and Assumption 2(b) imply that fn, ■ ■ ■ , fiM have uniformly 
bounded gradients (denoted by Vfim, m = 1, • • • , M) and are Lipschitz continuous, i.e., for 
some Lf > 0, 

max \\Vfim{xi)\\ < Lf, \/xi e Xi (30) 

\<ra<M 

max \fira[Xi) - fimiVi)] < Lf\\xi - ViW yxi, yi E Xi. (31) 

l<m<M 

Similarly, Assumption 2(a) and Assumption 2(c) imply that ga . . . , gip have uniformly bounded 
subgradients (gradients if they are continuously differentiable) and are Lipschitz continuous, i.e., 
for some L^ > 0, 

max \\Vgip{xi)\\ < Lg Wxi G Xi, (32) 

i<p<p 

max \gip{xi) - gip{yi)\ < Lg\\xi - yi\\ \/xi,y, e Xi. (33) 

i<p<p 

In addition, each fi and gi are also bounded, i.e., there exist constants C/ > and Cg > 
such that for all i = 1, . . . , N, 

\\fi{xi)\\ < Cf, H{x,)\\ < Cg, \/Xi e X„ (34) 



where ||/i(a;i)|| = y Em=i /?m(a^*) and \\g^{x,)\\ = ^jYlp=i9U^t)- 

We also need the following assumption on the network utility costs T and T: 
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Assumption 3 (a) The function T is continuously differentiable and has bounded and Lipschitz 
continuous gradients, i.e., for some Gj > and Ljr > 0, we have 

\\VTix)-VTiy)\\<G^\\x-y\\ \fx,yeR"', (35) 

\\VT{y)\\<L^ \fyeR^'; (36) 

(b) The function ^ is convex and has Lipschitz continuous gradients, i.e., for some Gjr > 0, 

\\VT{x)-VTiy)\\<G^\\x-y\\ ^x.yeX. (37) 

Note that the convexity of T and Assumption 2(a) indicate that T is Lipschitz continuous, i.e., 
for some Lf > 0, 

\\f{x)-f{y)\\<L^\\x-y\\ \fx,yeX. (38) 

Assumptions 2 and 3 imply that problem (2) is a convex optimization problem. In cases that 
dip, p = 1, . . . , P, are smooth, we make the following additional assumption: 

Assumption 4 The functions gip, p = 1, . . . , P, are continuously differentiable and have Lips- 
chitz continuous gradients, i.e., there exists a constant G^, > such that 

max \\Vgip{xi) - Vgip{yi)\\ < Gg\\xi - yi\\ ^Xi,yi e Xi. (39) 

i<p<p 

We also have the following assumption on the network model [11], [17]: 

Assumption 5 The weighted graphs Q{k) ={y,8{k),W{k)) satisfy: 

(a) There exists a scalar < 77 < 1 such that [VF(/i;)]ii > 77 for all i^k and [VF(/i;)]ij > 77 if 
[W{k)l, > 0. 

(b) W{k) is doubly stochastic: T.J=i[W {k)]ij = 1 for all i,k and T.f=i[W {k)]ij = 1 \/j,k. 

(c) There is an integer Q such that (V, U£=i ... q£^(/c + £)) is strongly connected for all k. 

Assumption 5 ensures that all the agents can sufficiently and equally influence each other in a 
long run. 
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B. Main Convergence Results 

Let Ak = J2i=i '^^' ^^^ 1st 

i^r^ = ^Ea.^r\. = l,...,iV, (40) 



1=1 



be the running weighted-averages of the primal iterates x}', . . . ,xl' ' generated by agent i 
until time k — 1. Our main convergence result for Algorithm 1 is given in the following theorem: 



Theorem 2 Let Assumptions 1-5 hold, and let pi < l/(Gj- + Dxy PGg). Assume that the step 
size sequence {a^} is non-increasing and such that a^ > for all k > 1, YlT=i ^k = oo and 
YlT=i '^l < °*^- Then, for x^^^ = {x\ , . . . , ijv ) '^^^ \ , ^ = 1, • • • , iV, generated by Algorithm 
1 using the gradient perturbation points in (23), we have 
i) The sequence {x'^'^^} converges to an optimal solution x* E X of problem (2); 
//) The sequences [X] }, i = 1,. . .,N, converge to a common dual optimal solution A* of 
problem (2). 

Theorem 2 indicates that the proposed distributed primal-dual algorithm asymptotically yields 
an optimal primal and dual solution pair for the original problem (2). 

The same convergence result holds if the constraint functions gip, p = 1, . . . , P, are non-smooth 

(k) 

and the perturbation points al axe computed following (24): 

Theorem 3 Let Assumptions 1, 2, 3, and 5 hold, and let pi < 1/G^. Assume that the step 
size sequence {o-fc} is non-increasing and such that a^ > for all k > 1, YlT=i '^fc = oo and 
Xlfcli o-l < oo- Tet the sequences {x^''^ and {X} }, i = 1, . . . , N, be generated by Algorithm 
1 using the perturbation points in (24) and (23b). Then, {cc^'^)} and {X- }, i = 1,...,N, 
converge to an optimal primal solution x* E X and an optimal dual solution A* of problem (2), 
respectively. 

The proofs of Theorems 2 and 3 are presented in Appendix A and Appendix B, respectively. 

Remark 1 It is worthwhile to note that when the step size a^ has the form of a/(h + k) where 

a > 0, 6 > 0, one can simply consider the running average below 

fc-i 

k 



a.(^) 



\ y a^w = ii-\\ x^'~'^ + T ^^'~'\ (41) 

k -^^ \ k I k 

1=0 ^ ^ 
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instead of the running weighted-average in (40)^. 

V. Simulation Results 
In this section, we examine the efficacy of the proposed distributed PDP method (Algorithm 

1) by considering the linear sparse regression problem and the demand response control problem 
discussed in Section II-B. 

Example 1 (Linear sparse regression): Consider the linear sparse regression problem below 

N 

min ^{xi,. . . ,xn) = \\r —}^ AiXiW'^ (42a) 



i=l 
N 
S 

i=l 



•t. ^\\x^\\x<k^, (42b) 



where r G M and Ai e R^'^-^, i = 1,. . . ,N. To generate r and Aj, i = 1,. . . ,N, we 
considered a distributed image sparse decoding task [7]. Specifically, we randomly extracted 
3,000 overlapping patches with dimension 8x8 from the 512 x 512 BARBARA image, followed 
by applying the K-SVD algorithm [7] to learn a dictionary D (i.e., the predictor matrix) with size 
64 X 900 based on the extracted patches. One of the patches was added with Gaussian noise with 
zero mean and variance 0.5, which is then used as the response signal r. Two network scenarios 
were considered. The first scenario contains 10 agents (A^ = 10), each of them has a regression 
variable Xi with dimension 10 (K = 10). The predictor matrices Ai, . . . , Aio G M^^^^o were 
obtained from the first 100 columns of D. Besides, k^ was set to 3. The second scenario has 100 
agents (N = 100), K = 9, ko = 10, and D = [Ai, . . . , Aioo] where each Ai G R^^""^. For both 
scenarios, the network graphs Q were randomly generated. Note that, for (42), the associated 
proximal perturbation point in (24) has a close-form solution similar to the soft thresholding 
operator in (18), and thus is easy to implement. In addition to the proposed PDP method, we 
also implemented the recently proposed distributed (consensus-based) PD subgradient method 
in [15] (which does not have perturbation point) for comparison"^. We evaluated the normalized 

^It can be shown [47] that a;'*^' also satisfies (A. 23) in Appendix A-B, and thus Theorems 2 and 3 also hold for x^^' . 
''while the distributed PD method in [15] is not directly applicable to (42) due to the coupled objective function, one can 
utilize the linear structure to show that (42) is equivalent to the following saddle point problem (by Lagrange dual) 

max -^ min --\\fj,\\^ + fjT^ (r -"S^ AixA + X(S^ \\xi\\i - ko)> 
^.eK" i=i....,iv, »=i '=1 
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accuracy at each iteration k: 



Normalized accuracy 



where ^* denotes the optimal value of (42) which was obtained by the (centralized) convex 
solver CVX [48]. In addition, we computed the primal feasibility of constraint (42b): 

I V^A" II (k) II r I 

Figure 1(a) displays the convergence curves for the first network scenario with N = 10, 
K = 10 and ko = 3. The step size of the distributed PD method in [15] was set to a^ = yq^t^; 
while the step size a^ and parameters pi and p2 of the proposed distributed PDP method were 
set to ttk = Yoo+fe' P'^ ~ '^•^ ^^^ P"^ ~ ^' respectively. Note that these parameters were chosen 
based on cross validation so that each of the methods can respectively exhibit best convergence 
results. We can observe from Figure 1(a) that, for the proposed distributed PDP method, the 
instantaneous iterates oscillate whereas the running average iterates converge well but slower. 
We also see from this figure that the proposed distributed PDP method converges faster than 
its counterpart without perturbation in [15] (running average iterates). Figure 1(b) and Figure 
1(c) respectively show the primal feasibility curves of the proposed distributed PDP method 
and the distributed PD method in [15]. We see that the instantaneous iterates of both methods 
oscillate and may not be feasible (they are nearly feasible though); whereas, for both methods, 
the running average iterates are always feasible. 

Figure 1(d) presents the convergence curves for the second network scenario with N = 100, 
K = 9 and ko = 10. The step size was set to a^ = j^r^ for the distributed PD method 
in [15]; while a^, pi and p2 were respectively set to a^ = yo+I' Pi = ^-^ ^^^ P2 = 0.5 for 
the proposed distributed PDP method. Comparing with Figure 1(a), we first observe that the 
convergence speed of both methods decreases with the network size. Nevertheless, the proposed 
distributed PDP method still converges much faster than the method in [15]. In Figure 1(e), we 
further present the optimal sparse regression solution of (42) (by CVX) and that obtained by 
the proposed distributed PDP method (at iteration 10, 000). One can observe that the solution 
obtained by the proposed distributed PDP method exhibits a similar sparse pattern as the optimal 
solution. 

to which the method in [15] can be applied. 
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Fig. 1 : Convergence curves of distributed methods for the linear sparse regression problem (42), 
with A^ = 10, K = 10 and ko = 3. 



Example 2 (Smart grid demand response control): This example considers the demand 
response control problem presented in (3) and (4). The cost functions were set to Cp(-) = 7rp|| ■ p 



and CJ-] 



TTs 



^ respectively, where iVp and -ns are some price parameters. The load profile 



function %l>i{xi) is based on the load model in [18], which were proposed to model deferrable, 
non-interruptible loads such as electrical vehicle, washing machine and tumble dryer et. al. 
According to [18], '4>i{xi) can be modeled as a linear function, i.e., il)i{xi) = ^iXi, where 
^i E MF^^ is a coefficient matrix composed of load profiles of appliances of customer i. The 
control variable Xi G M^ determines the operation scheduling of appliances of customer i. Each 
Xi is subject to a local constraint set Xi = {xi E M^ | Ajdj ^ 6j, li < di < u,i} due 
to some physical conditions and quality of service constraints [18]. The problem formulation 
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corresponding to (3) is thus given by 



mm TTf 

i=l,...,N 



N 



E*. 



Jui 



P 



j=i 



VTc 



TV 



p-E* 






i=l 



(43) 



Analogous to (4), problem (43) can be reformulated as 



mm 

Xii=Xi.i=l,....N, 
z>-0 



%ll^l 



N 



+ 71. 



N 



-J2^i^i+p 



i=l 



St. V^ ^iXi — p — z ^ 0, 

4 = 1 



(44a) 



(44b) 



to which the proposed distributed PDP method can be applied. We consider a scenario with 
400 customers (A^ = 400), and follow the same methods as in [49] to generate the power 
bidding p and coefficients ^j, Aj, 6j, Zj, Ui, i = 1, . . . , N. The network graph Q was randomly 
generated. The price parameters Hp and tts were simply set to 1/A^ and 0.8/A^, respectively. In 
addition to the distributed PD method in [15], we also compare the proposed distributed PDP 
method with the distributed dual subgradient (DDS) method^ [18], [25]. This method is based 
on the same idea as the dual decomposition technique [25], where, given the dual variables, each 
customer globally solves the corresponding inner minimization problem. The average consensus 
subgradient technique [10] is applied to the dual domain for distributed dual optimization. 

Figure 2(a) shows the convergence curves of the three methods under test. The curves shown 
in this figure are the corresponding objective values in (43) of the running average iterates of 
the three methods. The step size of the distributed PD method in [15] was set to a^ = -ttttt and 
that of the DDS method was set to a^ — °°^ 
and p2 were respectively set to a^ = j^ ana pi = p2 
that the proposed distributed PDP method and the DDS method exhibit comparable convergence 
behavior; both methods converge within 200 iterations and outperform the distributed PD method 
in [15]. One should note that the DDS method is computational more expensive than the proposed 

^One can utilize the linear structure to show that (43) is equivalent to the following saddle point problem (by Lagrange dual) 



10+fc 

- ^Q.f^- For the proposed distributed PDP method, a^, pi 
"■^ and pi = P2 = 0.001. From this figure, we observe 



1 2 1 2 T ^ 

A^O, I aSigA^i 47rp 47rs ^ — ' 

rj^O i = l,...,JV ! = 1 



to which the method in [15] and the DDS method [25] can be applied. 
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Fig. 2: Numerical results for the smart grid demand response control problem (43) with 400 
customers. 

distributed PDP method since, in each iteration, the former requires to globally solve the inner 
minimization problem while the latter takes twice primal subgradient updates only. 

In Figure 2(b), we further display the load profiles of the power supply, unscheduled load 
(without demand response control), and the load scheduled by the proposed distributed PDP 
method. The results were obtained by combining the proposed distributed PDP method with 
the certainty equivalent control (CEC) approach in [18, Algorithm 1] to handle a stochastic 
counterpart of problem (43). The stopping criterion was set to the maximum iteration number of 
500. We can observe from this figure that the power balancing can be much improved compared 
to that without demand response control. Specifically, the cost in (43) is 4.49 x lO'' KW for the 
unscheduled load whereas that of the load scheduled by the proposed distributed PDP method 
is 2.44 X 10^ KW (45.65% reduction). The cost for the load scheduled by the distributed DDS 
method is slightly lower which is 2.38 x 10"^ KW; whereas that scheduled by the distributed PD 
method in [15] has a higher cost of 3.81 x 10"^ KW. 

VL Conclusions and Future Works 

In this paper, we have presented a distributed consensus-based PDP algorithm for solving the 
problem formulated in (2), which has a globally coupled cost function and inequality constraints. 
The algorithm employs the average consensus technique and the primal-dual perturbed (sub-) 
gradient method. We have provided a convergence analysis showing that the proposed algorithm 
enables the agents across the network to achieve a global optimal primal-dual solution of 
the considered problem in a distributed manner. Moreover, the effectiveness of the proposed 



DRAFT 



April 23, 2013 



21 

algorithm has been demonstrated by applying it to a sparse linear regression problem and a 
smart grid demand response control problem. In particular, the proposed algorithm is shown to 
have better convergence property than the distributed PD method in [15] which does not have 
perturbation. In addition, the proposed algorithm performs comparably with the distributed dual 
subgradient method [25] for the demand response control problem, even though the former is 
computationally cheaper. 

There are several interesting research directions to pursue in the future. One direction is 
to extend the algorithm to asynchronous network models such as those considered in [50], 
[51]. The other direction is to study the convergence rate of the proposed PDP algorithm. 
In addition, the current practiced stopping criterion is a maximum iteration number. It would 
be interesting to study advanced distributed stopping criterion (e.g., based on the primal-dual 
optimality conditions) so that the algorithm can stop wisely in a distributed manner. 

Appendix A 
Proof of Theorem 2 

A. Preliminaries 

Three key lemmas that will be used in the proof are presented first. The first is a deterministic 
version of the lemma in [52, Lemma 11, Chapter 2.2]: 

Lemma 1 Let {bk}, {dk} and {ck} be non-negative sequences. Suppose that Yl^=i Cfc < oo and 

h < h^i - 4-1 + Cfe-i V fc > 1, 

then the sequence {bk} converges and Yl^=i ^k < oo. 

Moreover, by extending the results in [17, Theorem 4.2] and [11, Lemma 8(a)], we establish 
the following result on the consensus of {A- '}, {yl '}, and {z- ''} among agents. 

Lemma 2 Suppose that Assumptions 1, 2 and 5 hold. If {ak} is a positive, non-increasing 
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sequence satisfying YlT=i ^l < ^^' ^^^'^ 

oo 

y afcllAf) - A(^^)|| < oo, lim ||Af) - A«|| = 0, 

fc=l 

oo 

y a,||Af ) - X^'-'^W < oo, lim ||Af) - \^'-'^\\ = 0, 

fc=l 

oo 

•^— ' fc— s-oo 

A;=l 

oo 

Vafcllif -i^'^^i)!! <oo, lim ||iP-i('=-i)|| =0, 

•^-— ^ fc— >-oo 

fc=l 
/or all i = 1, . . . ,N, where 

N N N 

i=l j=l i=l 

The proof is omitted here due to limited space. Lemma 2 implies that the local variables A^- , 
y\ and z\ at distributed agents will eventually achieve consensus on the values of A'^'^^ y^^^ 
and z^'^\ respectively. 

The next key lemma will show that the local perturbation points oq ' and /3| in (23) and 
(24) will also achieve some consensus asymptotically. In particular, following (14), we define 

«f^ = V;,.X^t'^ - p,[yff{xt'^)WHNy^'^) + Valixt'^-'^ (A.2a) 

/3W = Pp(A('=-^) + p2 Nz^''^), (A.2b) 

for i = 1, . . . , A^, as the 'centralized' counterparts of (23); similarly, following (15), we define 

af"^ =arg min gf{c^.)X^'~'^ + -^||a, - {xf^ - p,Vff{xf-'^)VHNy^'-'W. (A.3) 

for i = 1, . . . , N, as the centralized counterparts of the proximal perturbation point in (24). We 
show in Appendix C the following lemma: 

Lemma 3 Let Assumptions 2 and 3 hold. For {a- , /3| }^i in (23) and {a.[ , . . . , a.}^' , 0'^''^) 
in (A. 2), it holds that 

||af ) - af II < PiL^v^ll Af ) - A('=-^)|| + p,G^LfV^N\\yl'^ - y^^^^ (A.4) 

0'^-(3l'^\\ < ||Af^-A('^-^)||+p2iV||if -i('^-^)||, (A.5) 
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(k) 

i = 1, . . . , N. Equation (A.4) also holds for the proximal perturbation point olI in (24) and 
af^ in {A3). 

Lemma 3 says that, when A,- , y\ and z\ at distributed agents achieve consensus, each 
a.\ converges to q:| % and all the j3\ converge to the common point ^^^\ 



B. Proof of Theorem 2 

We first show that the primal-dual iterate pairs {x\ , . . . ,x)^ , x'^'^) converge to a saddle 
point of (19), followed by showing that {x[ , . . . , x)^ , A^*^^) satisfies the primal-dual optimality 
conditions in Proposition 1 as A; — )■ oo. The following lemma gives the basic relations for the 
iterates of Algorithm 1. 

Lemma 4 Let Assumptions 2 and 5 hold. Then, for any x = {xj, . . . , a;^)^ G X and \ E V, 
the following two inequalities are true: 

Wx^'^^-xP 



N 



+ alN{VMLfL^ + DxVPL.y + 2afciVD,.yML;G^^ || j^f - y^ 

TV 

+ 2a.D.v^L, Y. (llAf ) - A(^-^)|| + p,N\\zi'^ - zf^w) , (A.6) 

N N y s 

J2 W^t^ - ^f < E 11^?'"'^ - ^11' + 2« J £(aW, A(^-^)) - £(aW, A) ) + alNC^ 
+ 2a,{2p,D^PLl + C,)|| Af ) - A^^-^)]! + Ap.ND^G^^PMLgLfauWyf^ - ^f "'^11- (A.7) 

Proof of Lemma 4: By (25), the non-expansiveness of projection [42], the subgradient bound- 
edness in (30), (32) and (36), and by (20), one can show that, for any x = {xi, . . . , x^) E X, 

N ^ / r n \ 

Y^ ll^f ) -x£=Y: \\V.. U-'^ - a, V/f (.f-^V-^(iV^f ) + ^9li^-'')0t^ ) 
i=l j=l V L J / 

N 

< Y ll^f ~'^ - ^^f + alN{y/MLfL^ + D^VPLgf 

i=l 

N 

-2a, Y^xf-'^ - x.r{yfI{xt'^)VHNyf^) + VgJ{xf-'^)(3f\ (A.8) 
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The last term in (A. 8) can be further bounded as follows: 



N 



N 

i=l 

N 

< -2a, 5^(xf -^) - a:.)^(V/f (a:f "^ V-^(iV^f "^^) + V^f (xf "^))/3(^^)) 

AT JV 



N 



N 



i=l j=l 

AT AT 

+ 2akND,VMLfG^Yl H^'^'^ " ^^'"'^H + '^(^kD.VPLg J^ ||/3f - /jC^)!!, (A.9) 

where the first inequality is obtained by the subgradient boundedness, the Lipschitz continuity 
of VJ-", and the compactness of Xi and V (cf. (11a), (29), (35), (30), (32)), and the second 
inequality is due to the definition of the subgradient of a convex function, i.e., C{x'^^^^\ fi'^^^) + 

{x - a;('=-i))^£^(a;(^-i),/3('^)) < C{x,^^^^) Va; e X. By combining (A.8) and (A.9) and 
applying (A. 5) in Lemma 3, we obtain (A. 6). 

By using (26) and a line of analysis similar to that of the proof of (A. 6), we can obtain, for 
any A G "D, 



N N N N 

Y: II Af - Af < $: II Af) -Xr + alY: M^^^'W + 2a, $:(Af ) - A)-^.(af ) 

1=1 j=l i=l i=l 

N N 

< J2 II Af-^^ - Af + alNC'^ + 2a, 5^(A« - XfgM'^), (A.IO) 

DRAFT April 23, 2013 



25 



where the last inequality is due to the boundedness of the function values (cf. (34)). We can 
bound the last term as 



N 



N N 



1=1 i=l 

N 



J=l 

N N 



/ TV 

\i=i 

N 

+ 2a, Y^iX^'^-'^ - A)^(^.(af )) - ^.(af ))) + 2a, ^^(Af ^ - A('=-))^g.(«f )) 
<2a,(A('=-)-A)^(5^^.(af)) 

TV 

+ 4D,yPL,a,^||c.f -df))||+2C,a,^||Af)-A('=-i)||. (A.ll) 

1=1 i=l 

where in the last inequality, we have used A^ , A G V and the compactness of V (see (20)), 
and the Lipschitz continuity of gi (cf. (33)). Note that, since C is linear in A, we have 

C{a.^''\ A) = £(««, A^'^-i)) + (A - X^'^^'-^f Cx{cx^''\ A^^^-^)). (A.12) 

where a^^) = ((af ^ , . . . , (A^^)^ and Cx{di^''\X^''~^^) = Eti9^(«f^ By combining 
(A. 10), (A.ll), (A.12) and (A.4) in Lemma 3, we obtain (A.7). ■ 

We also need the following lemma which characterizes the relation between the primal-dual 
iterates {x'^^^^\ X^'^~^'>) and the centralized perturbation points {cx.^^\ 13^'^'^) in (A. 2): 

Lemma 5 Let Assumptions 2, 3 and 4 hold. For the gradient perturbation points [aS \ 13^^') 
in (A. 2), it holds true that 

£(x('="i),/3('^))-/:(A(^),A(^-i)) 

> ('l_(G'^ + /),v^G,)Y|a^('="^)-«(^^)f + l||A(^-i)-/§Wf. (A.13) 

\Pl J P2 

Moreover, let pi < 1/{G^ + DxVPGg), and suppose that C{x'~^-^\ (3'-^'^) - C{a.'^^\ >S^^^'^) -^ 
and {x^''^^\ X^^^^^) converges to some limit point (x*, A*) E X x V as k -^ oo. Then (x*, A*) 
is a saddle point of (19). 

The proof is presented in Appendix D. By Lemmas 2, 4 and 5, (x^''\ X^''^) converges to a 
saddle point of (19): 
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Lemma 6 Let Assumptions 2-5 hold, and let pi < \/{Gjr + D\yPGg). Assume that the step 
size Ofc > is a non-increasing sequence satisfying Yl^=i Ofc = oo and Yl'k=i ^l ^ °*^- ^'^^^ 

lim ||ajf^-A*|| =0, z = l,...,Ar, lim IIA^'^) - A*|| = 0, (A.14) 

^11^ ill ' J J J II Jl / \ y 

— ^oo fe— >oo 

lim ||a;(^-^) - a'^''^\ = 0, lim \\\'^''-^^ - /3('^)|| = 0, (A.15) 

fe— >oo fc— >oo 

where x* = {{x\)'^, . . . , (^r^)"^)^ G X and A* G V form a saddle point of problem (19). 

Proof of Lemma 6: By the compactness of the set X and the continuity of the functions T 
and Qi, problem (2) has a solution. By Assumption 1, the dual problem also has a solution. By 
construction of the set V in (20), all dual optimal solutions are contained in the set V. We let 
X* = ((aj*)-^, . . . , (a;^)^)-^ G X and A* G P be an arbitrary saddle point of (19), and we apply 
Lemma 4 with x = {xj, . . . , a;^)^ = x* and A = A*. By summing (A. 6) and (A. 7), we obtain 
the following inequality 

N N 

(Wx'^''^ - X*\\^ ^ ^^ II ^^^'^ - ^^11^^ ^ ni^(fc-l) _ ^*l|2 j_ \^ II XC^-I) _ \*||2 



+ 5^ IIAf ) - Vf ) < (||a:(^-^) - =.1P + 5^ IIAf 
+ Cfc - 2a, ('/:(aj('=^i), /3W) - C{x\ /S^'^^) - £(«('=), A^^^-^^) + £(««, A*)k.l6) 



where 



4 = alN[{VMLfL^ + D^y/PLgf + Cj] 



+ 2[D,yPL, + C, + 2p^PD^L^^ I](afc|| Af ) - A^'^"^)]!) + 2N^LjGr{D^ 

i=l 

N N 

+ 2piDaV^L,) 5^(afc||j/f - y(^-i)||) + 2Np2D,VPL, 5^(afc||if - i^'-^^H). (A.17) 
First of all, by Theorem 1, we have 

/:{d^''\ A*) - C{x\ A*) > 0, C{x\ A*) - C{x\ /3(^)) > 0, 
implying that C{a^''\ A*) — C{x*,f3^^^) > 0. Hence we deduce from (A.17) that 

N N 

(||a;W _ x*f + ^ ||Af ^ - A*|n < (lla;^'^-!) - x*f + ^ ||Af ^'^ - yf) 

4=1 i=l 

+ 4 - 2afc(/:(a;(^^i), /3(^)) - /:(a('^\ A^'^^i))). (A.18) 
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Secondly, by J2T=i ct^ < oo and by Lemma 2, we see that all the four terms in c^ are summable 
over k, and thus ^^^ c^ < oo. Thirdly, by Lemma 5 and under the premise of pi < l/{Gjr + 
DxVPGg), we have C{x^''^^\ P^'''^) - C{a^''\ X^'"'^^) > 0. Therefore, by applying Lemma 1 to 
relation (A. 18), we conclude that the sequence {||a3*^'^^ — a;*||^ + ^^^^ || A^- ' — A*||^} converges for 
any saddle point (a;^ A*), and it holds that Er=i «fc (c{x^'''^\ $^''^) - /:(aW, A^^-^))) < oo. 
Because YlT=i ^^ = oo, the preceding relation implies that 

liminf£(a;('=-i),/3('=))-/:(aW,A('=~i)) = 0. (A.19) 

k—^oo 

Equation (A.19) implies that there exists a subsequence £1,^2, • • • such that 

£(a;(^^"i), /3(^^)) - C{a^^^\ X^^^"^^) ^ as A; ^ 00. (A.20) 

According to Lemma 5, the above equation indicates that 

lim ||a;(^*-^) - A^^^^ll = 0, lim \\X''^^-^'> - ^^^^-^W = 0. (A.21) 

fc— >oo fc— ^-oo 

Moreover, because {(33^^'="^^ A'-^''"^^)} c A" x P is a bounded sequence, there must exist a limit 
point, say (x*, A*) E X x V, such that 



X 



4-1) ^ X*, A(^'="^) -^ A*, as A; ^ cx). (A.22) 



Under the premise of pi < l/{Gjr + D\\fPGg), and by (A.20) and (A.22), we obtain from 
Lemma 5 that {x* , X*) G A" x P is a saddle point of (19). Moreover, because 

N N 

||a;(^^) - i*f + ^ IIAf^) - A*f < Wx^'^^ - i;*f + 5^(||Af^-^ - A(^^)|| + ||A(^^) - A*||)2, 

we obtain from Lemma 2 and (A.22) that the sequence {||a;'^'^) — a;*|p + Ylii=i ll\ ~ -^^IP} has 
a limit value equal to zero. Since the sequence {Haj*^^) ~^*P + X]i=i ll\ •^*ll^} converges for 
any saddle point of (19), we conclude that {||x*^'''^ ~^*P + X]i=i ll\ ~'^*ll^} i'^ f^'^t converges 
to zero, and therefore (A. 14) is proved. Finally, relation (A. 15) can also be obtained by (A. 14), 
(A.18) and (A.13), provided that pi < l/{Gp + DxVPGg). ■ 

According to [47, Lemma 3], if a;*^'^) — )• x* as k — )• 00, then its weighted running average 
x^'''> defined in (40) also converges to jc* as A; — )■ 00. The next lemma further shows that x^'^'^ 
together with A*^'^^ asymptotically satisfy the optimality conditions given by Proposition 1 . 
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Lemma 7 Under the assumptions of Lemma 6, it holds 



lim 

fc— >-oo 



. N .+ / N 



^iia; 



W^ 



0. 



(A.23) 



Proof of Lemma 7: By (A.7) in Lemma 4 and the fact of C{a.^^\ A) = C{a.'^^\ A^^^-^)) + (A 

^(fc-i))T£^^^(fc)^^(fc-i))^ we have 

AT AT \ 

where g{6S^^) = YliLi9i{o^t'^) and 



(A - A(-^))-^(a('=)) < ^ + ^ E ii^f "^ - ^r - E ii^f' - ^1'^ 



i=l 



Cfc ^ a^iVC^^ ^ 2aki2p^DxPLl + C,)||Af - A^^^-^)!! + 4piiVDAG^v/PML,Lyafc||i/f - yf "'^1 

By following a similar argument as in [27, Proposition 5.1] and by (A. 24), (20), (33) and (34), 
one can show that 

1 



Cfc 



:A-AT5(a.(^-^))<^ + -^($:i|A^ 

2ak 2afc \^ ^ 



N N 

(k-l) x||2 V^llxW 



Eii^S' 



i=l 



By taking the weighted running average of (A. 25), we obtain 

(A - yfoiA^'-'^) < ^ E«KA - yfgi^^'-^'^) 



+ 2N^D^L„\\x^^-^^ - Oi^'^'^W + NC„\\\^^~^^ - k'W. (A.25) 



2N^PLhLg^ 

^=1 



A, 



|a;(^-l)_«W|U 



A, 



5^a,||A(^-i)-A* 



11=1 



1 



2A^L)^ 2NVPDxLg 



2Au 



e=i 



Ak 



Ak 



' aJ|a;(^-^)-aW||4^^^ 



E 



=1 



A. 



5^a,||A(^-i)-A1| 

(A.26) 



i=i 



A^(fc-l)^ 



where the first inequality is owing to the fact that g{x) is convex, and the last inequality is 
obtained by dropping — Xli=ill'^i -^IP followed by applying (20). We claim that 



lim e^'^"^^ = 0. 

fc— >oo 



(A.27) 
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To see this, note that the first and second terms in ^('"'^^^ converge to zero as A; — )> oo since 
liuik^oo^k = oo and Yl'eLi'^e < oo. The term -^J2e=i'^{-\\'^^^~^^ ~ '^*\\ ^^^ converges to 
zero since, by Lemma 6, Hmfc_^oo ||A'^'^^ — -^^ll =0 and so does its weighted running average 
by [47, Lemma 3]. Similarly, the term -^J2e=i'^i\\^^^~^'' ~ a.^^^\ also converges to zero since 
limfc^oo ||a;(^-i) - a^^^W = due to (A. 15). 

Now let A = A* + (5 (g^'"""'')) ^hich lies in V, since ||A|| < \\X*\\ +S < Dxhy (21). 

||(3(d-('=-i))) + || II II _ II II - A y K J 

Substituting A into (A.26) gives rise to 

S\\ {g{x^'~''>)y \\ < ^'-'-'\ (A.28) 

As a result, the first term in (A. 23) is obtained by taking A; — )■ cxd in (A.28) and by (A. 27). 

To show that the second limit in (A. 23) holds true, we first let A = A* + 5 „^,, ,,,, G V. By 

11*^ II 

substituting it into (A.26) and by (20), we obtain {X^''~^^)^g{x'^^^^^) < (^) ^^^-^^ which, by 
taking A; — )■ oo, leads to 

limsup (A('=-^))^f/(i(^-^)) < 0. (A.29) 

k—^oo 

On the other hand, by letting X = e V, from (A.26) we have -{X^''~^^)'^g{x^^~'^'^) < 
^(k-i) + (X* - X^'^-^Yoi^^'"^^) < ^^'"^'^ + NCg\\X^''-^'> - X*\\. Since limfc^o^ ^^*^"^^ = and 
limfc^oo II A^''^ — A* II = by Lemma 6, it follows that liminffc_j.oo {X^''~^^)'^ g{x^''~^^) > 0, which 
along with (A.29) yields the second term in (A. 23). ■ 

By Lemma 6, Lemma 7 and Proposition 1, Theorem 2 is thus proved. 

Appendix B 
Proof of Theorem 3 

Theorem 3 essentially can be obtained in the same line as the proof of Theorem 2, except for 
Lemma 5. What we need to show here is that the centralized proximal perturbation point a.^'^'^ 
in (A. 3) and f3^''^ in (A. 2b) and the primal-dual iterates (x^''~^\ X^''^^^) satisfy a result similar 
to Lemma 5. The lemma below is proved in Appendix E: 

Lemma 8 Let Assumptions 2 and 3 hold. For the centralized perturbation points a^^' in (A. 3) 

and (3^^^^ in (A. 2b), it holds true that 

/:(a;('=-i),/3('=))-/:(aW,A('=-i)) 

> M _ ££^ lla.e^-i) _ dWf + lllA^'^-^) - ^^^^\\\ (A.30) 

\2pi 2 y P2 
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Moreover, let pi < l/G^, and suppose that C{x^^-^\ ^^^^)-C{a'-''\X^^-^^) ^ and {x^^-'^\ X^''-^^] 
converges to some limit point (i*, A*) E X x V as k ^ oo. Then (a;*, A*) is a saddle point of 
(19). 

Analogous to Lemma 6, as long as pi < -^, the primal-dual iterates {x^^\ X^^^) converge to a 
saddle point of (19). 



Appendix C 
Proof of Lemma 3 

We first show (A. 5). By definitions in (A. 2b) and (23b), and by the non-expansiveness of 
projection, we readily obtain 



^W_^W||< 



Vv { Ap) + P2 ATzf ) \-Vvi X^'~''> + P2 Nz^'~'^ 



P2 



<||Af^-A('=-^)||+P2iV||iP-i('-')| 



(k) (k) 

Equation (A.4) for the a] in (23a) and al in (A.2a) can be shown in a similar line: 



I (fc) - (fc)> 
\a] - a) 



^^. ( -^t'^ - Pi 



pTfAk-l) 



Vfl{xr")VJ'{Nyn + '^9t{^ 



r.C'h 



.TfAk~l)^Uk) 



- V^^ ( xt'^ - Pi 



< pi||V/f(^r^))||||V^(iVj|f ) - V^(iV^(^"i))|| +Pi||Vg.(ccf-^^ 

< PiL.v^llAf) - A(^^-i)|| +piG^L;v^iV||i,f - y^''~% (A.31) 

where, in the second inequality, we have used the boundedness of gradients (cf. (30), (32)) and 
the Lipchitz continuity of VJ-" (Assumption 3). 

To show that (A.4) holds for ex.- ' in (24) and o:,- in (A. 3), we use the following lemma: 

Lemma 9 [53, Lemma 4.1] If y* = argmmy^y Ji{y) + J2{y), where Ji : M" — )■ M and 
J2 '■ IR" — )■ IR are convex functions and y is a closed convex set. Moreover, J2 is continuously 
differentiable. Then y* = argminygy{Ji(j/) + VJj(i/*)i/}. 

By applying the above lemma to (24) using Ji{ai) = gf{a.i)X] ' and 



J2(a-i 



2pi 



\cti- {x 



(fc-1) 



fT/^(fe-l) 



p,Vfr{xr'')VnNy 



r,('^)^M|2 



DRAFT 



April 23, 2013 



31 

we obtain 

af ) = arg min ^f (a.)Af ^ + (V/f (a.f-^))V-^(iV£^f ) + -(«f - ^f "'^))^«.- (A.32) 
Similarly, applying Lemma 9 to (A. 3), we obtain 

af ) = arg min gf (a.)^^^-^) + (V/f (a.f-^))V^(iVy('^"i)) + l(af ^ - xt'^)fc,,. 

(A.33) 
From (A.32) it follows that 



< g[{a^)X^ + (V/f (a.f "^))V^(iV^f ) + l(af ^ - xf-^)))^af \ 



Pi 

+ 
Pi 



which is equivalent to 



pi 

(A.34) 



Similarly, equation (A.33) implies that 

0<(^f(af))-^f(af)))A('=-) 

+ V/f (a.f-^))V^(iV^('^-^^)(«f ^ - Af ^) + -(«f ^ - =-f-^^)(«f ^ - Af ^ (A.35) 

Pi 

By combining (A.34) and (A.35), we obtain 

— a ' - a 1 < iOi K - 9i a ^ - A^ 
Pi 

+ V/f (^r^))(VJ-(iVj^f ) - V^(iVj,('=-^)))(A« - af )) 

where we have used the boundedness of gradients (cf. (30), (32)), the Lipschitz continuity of 
VJ-" (Assumption 3) as well as the Lipschitz continuity of Qi (in (33)). The desired result in 
(A. 4) follows from the preceding relation. ■ 
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Appendix D 
Proof of Lemma 5 

We first prove that relation (A. 13) holds for the perturbation points q:[ ' and /3*^^) in (A. 2) 
assuming that Assumption 4 is satisfied. Note that (A. 2a) is equivalent to 



af^ = argmin \\(X, - x\'-'^ + p,C^X^^'-'\ \^'-'^)\\\ t = l,...,N, 

where L^X'^^^~^\>S^-^^) = Vff{xf'^^)V7{Ny'^^y) + VgJ{xf~^^)X'^^~^\ By the optimality 
condition, we have that, for all Xi ^ Xi^ 

{X, - &f^f{&f^ - xf^ + pi£.,(x(^"^), AC^"^))) > 0. 

By choosing Xi = x\ '^ ' , one obtains 

.™(^-i) _ ««)Tr ra.(^-i), A(^-i)) > -Wx^'^ - af)|p, 



Pi 



which, by summing over i = 1, . . . ,N, gives rise to 



Pi 



Further write the above equation as follows 



ix- --aW)^£.(AW,A('=-^): 



,(fc-i) 
1 , 



> 



Xik-l)_^{k).2 



> 



Pi 
1 

Pi 



x^'-'^ - a 



«||2_||^(/=-i)_^(fc)|| X ||/:,(a;(^--i),A('="i))-/:,(a('=U^'~'^)||. (A.36) 



By (11), Assumption 3, Assumption 4 and the boundedness of A*^*^ ^^ G V, we can bound the 
second term in (A.36) as 



\CUx^'-'\\^'-'^)-C^{a^'\\'''-'^ 



< ||V-F(a;(^--i)) - V^(a('=))|| + \\\^''~^^\ 



<{G^ + DxVPGg)\\x^'~'^-&^'^l 



.T(Ak-l) 



T/-(fc)^ 



Wgl{xr'>) - VgJia^; 



VgU^t'^)-'^9j^{AP: 



(A.37) 
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where || ■ ||f denotes the Frobenious norm. By combining (A.36) and (A. 37), we obtain 

{x^^-^^ - cx^^^f C^{(x^''\ AC^-i)) > (- - {G^ + D^^gS) 11^^'"'^ - a^'' f • (A.38) 

Since C{x^^-^\ X'^^-^^) - C{a.^^\\^^-'^^) > {x^^-'^^ - ot^^^Y C^{ci'^^\\^^-'^^) by the convexity 
of £ in a;, we further obtain 

C{x^''-^\ A(^-i)) - £(a('^), A('^-i)) > (- - {G^ + /^av^G,)") \\x^''''^ - d^'^ f. (A.39) 

On the other hand, by (A.2b), we know that ^^''^ = argmin^g^ \\f3-\^^-^^-p2 Eii^i(^!''~^^)f • 
By the optimality condition and the linearity of £ in A, we have 

/ N 



,j=l 



> l||A(^-i)_^W||2_ (A.40) 

Combining (A.39) and (A.40) yields (A. 13). 

Suppose that C^x^'''^^ 0^''^) — C{a^''\ X^'''^^) — > and {x^''~^\ X^''~^^) converges to some 
limit point (a;*, A*) as A; — )• oo. Since pi < l/{Gfr + Dx\/PGg), we infer from (A. 13) that 
||a:;(fc-i) _ Q,(fc)|| _^ and ||A(''"^) - /3('')|| -^ 0, as A; -^ oo. It then follows from (A.2) and the 
fact that projection is a continuous mapping [53] that (x*, A*) E X x V satisfies 

xt = V;,, (xt - Pi[Vf^ix^)VT(^J2 •^*(*^)) + Vfff (ccDAl) , ^ = 1, . . . , AT, 

N 

which, respectively, imply that x* = argminajg;^' £(a3, A*) and A* = argmaxAeo £(»;*, A) i.e., 
(x*, A*) is a saddle point of problem (19). ■ 

Appendix E 
Proof of Lemma 8 

The definition of ck'-'^^ in (A. 3) implies that 

+^ii«f^--r^ir < ^r(xr^)A(-^\ 
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which, by summing over i = 1, . . . , N, yields 



+ ^||c,W-a;('^-i)f < g^(a^W)A('=-i), (A.41) 

2pi 



where gr(Q:*^*^^) = ^i=i gf i^i )• By substituting the decent lemma in [53, Lemma 2.1] 
into (A.41), we then obtain 



,2pi 2 
which, after combining with (A. 40), yields (A. 30). 

(k) 

To show the second part of this lemma, let us recall (A. 33) that al in (A. 3) can be 
altematively written as 

df ) = arg min gf{c..)X^'~'^ + (V/f (xf-^))V^(iV^(^"i)) + -(df ^ - xt'^))^c.,, 

which implies that, for all xi G Xi, we have 

^f (df ))A('=-) + (V/f (xf-^))V^(iV^('=-)) + l(df ) - a.f-^)))^(df - x!t~^) 

Pi 

< gJ{x,)X^'-') + {VfJ{xf-'^)VHNy^'-'') + -(df ) - xf^'^)nx, - xf^ 

Pi 

By summing the above inequality over i = 1, . . . , N, one obtains, for all a; G A", 

y^(dW)V + VJ-^(a^('=-i))(d« - a;(^-i)) + -||dW - icC^-^^f 

Pi 

< g'^ix)^ + Vf'^ix^'-'^)ix ~ x^'-'^) 

+ 1 $](df ) - a.f-^^)(a.. - xf') + (A^ - A(^^^))(^(d(^)) - g{x)) 

on ^ 

< g^{x)x* + :f{x) - :f{x^'-'^) + ±^ V ||df ) - xf^w + 2C,||A'^ - x^'-'^i 

P^ 1^1 
where we have utilized the convexity of F (Assumption 3), boundedness of Xi and the constraint 
functions (cf. Assumption 2 and (34)) in obtaining the last inequality. By applying (A.42) to the 
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above inequality and by the premise of 1/pi > G/- > G^/2, we further obtain, for all a; G A", 

on ^ 



i=l 



N 
Pi 

+ 2Cg|| A* - A('=-i)|| + |£(a;*, A*) - C{a^''\ A*)|, (A.43) 

in which one can bound the last term, using (38), (31), (33) and (20), by 

\C{x\ k") - C{a''''\ \*)\ < {L^ + NDxVPLg)\\x* - a^'^^l (A.44) 

Suppose that ^(a^^''"^^^^'')) — C{ct^''\ X^''~^^) — )■ and {x^''~^\ X^''~^^) converges to some 
limit point {x*, A*) as A; — t- oo. Then, by (A. 30) and since 1/pi > G^, we have || (x^''~^\ A'^'^"^^) — 
(ckW,/3W)|| ^ 0, as A; ^ oo. Therefore, 



lim 

fc— )-oc 



(^E li"^'^ - ^^"11 + 2Gsii^* - ^^'"'^11 + i^i^^'^^n ^A*^n\] = 0. 



Thus, it follows from (A.43), (A.44) and the above equation that C{x*, A*) < C{x, A*) for all 
X E X. The rest of the proof is similar to that of Lemma 5. ■ 
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